SemanticScuttle - klotz.me » Tags: data science

Tags: data science*

0 bookmark(s) - Sort by: Date ↓ / Title /

Semantic Telemetry: Understanding how users interact with AI systems

This blog post introduces the Semantic Telemetry project at Microsoft Research, which uses a data science approach to analyze how people interact with AI systems, specifically focusing on Copilot in Bing usage. It discusses the complexity of human-AI interactions and how they differ from traditional search.

Topics: Copilot in Bing chats were analyzed for topic categorization. Technology (21%) was the most common topic, followed by Entertainment (12.8%), Health (11%), and others. Within technology, programming and scripting were prominent subtopics.
Platform Differences: Mobile users tend to use Copilot for personal tasks, while desktop users engage in more professional activities.

2025-03-10 Tags: semantic telemetry, llm, microsoft research, copilot, bing, hci, weiwei yang, data science by klotz

How to Work With Polars LazyFrames

Learn how to create and use Polars LazyFrames for efficient data processing. Discover lazy evaluation, predicate and projection pushdown, and how to handle large datasets.

2025-02-28 Tags: polars, lazyframe, data science, pandas, spark by klotz

Retrieval Augmented Generation in SQLite

The article explores the concept of Retrieval-Augmented Generation (RAG) using SQLite, specifically with the sqlite-vec extension and the OpenAI API. It outlines a simplified approach to RAG, moving away from complex frameworks and cloud vector databases, using SQLite's virtual tables for vector search and semantic understanding.

2025-02-20 Tags: rag, llm sqlite, sqlite-vec, vector search, machine learning, data science by klotz

Tutorial: Semantic Clustering of User Messages with LLM Prompts

This tutorial demonstrates how to perform semantic clustering of user messages using Large Language Models (LLMs) by prompting them to analyze publicly available Discord messages. It covers methods for data extraction, sentiment scoring, KNN clustering, and visualization, emphasizing faster and less effort-intensive processes compared to traditional data science approaches.

2025-02-18 Tags: semantic clustering, llm, knn, sentiment analysis, data science, vectordb, discord, solon by klotz

The Big Book of Large Language Models

A comprehensive guide to Large Language Models by Damien Benveniste, covering various aspects from transformer architectures to deploying LLMs.

Language Models Before Transformers
Attention Is All You Need: The Original Transformer Architecture
A More Modern Approach To The Transformer Architecture
Multi-modal Large Language Models
Transformers Beyond Language Models
Non-Transformer Language Models
How LLMs Generate Text
From Words To Tokens
Training LLMs to Follow Instructions
Scaling Model Training
Fine-Tuning LLMs
Deploying LLMs

2025-02-11 Tags: llm, damien benveniste, machine learning, data science, book by klotz

Hex: Advanced Compute Profiles and Data Analysis Tools

Hex introduces Advanced Compute Profiles for demanding workflows, offering more CPU, RAM, and GPUs. It also features Explore, a fast, flexible no-code data analysis tool. Hex emphasizes collaboration, AI integration, and a wide range of use cases including data science, operational reporting, and self-serve data tools.

2025-02-07 Tags: hex, no-code, data analysis, data science, visualization, eda by klotz

Accurate predictions on small data with a tabular foundation model | Nature

TabPFN is a novel foundation model designed for small- to medium-sized tabular datasets, with up to 10,000 samples and 500 features.
It uses a transformer-based architecture and in-context learning (ICL) to outperform traditional gradient-boosted decision trees on these datasets.

2025-01-30 Tags: tabofn, llm, transformer, tabular data, prediction, imputation, data science, nature by klotz

The Data Scientist’s Dilemma: Answering 'What If?' Questions Without Experiments

The article discusses methods for data scientists to answer 'what if' questions regarding the impact of actions or events without having conducted prior experiments. It focuses on creating counterfactual predictions using machine learning techniques and compares a proposed method with Google's Causal Impact. The approach involves using historical data and control groups to estimate the effect of modifications, addressing challenges such as seasonality, confounders, and temporal drift.

2025-01-11 Tags: data science, causal inference, counterfactual prediction, machine learning, causal impact, time series, forecasting by klotz

Advanced Pandas Techniques for Data Processing and Performance

The article explores 11 essential tips for leveraging the full potential of the Pandas library to boost productivity and streamline workflows in handling and analyzing complex datasets. It uses a real-world dataset from Kaggle's Airbnb listings to illustrate techniques such as chunked processing and parallel execution.

2025-01-10 Tags: pandas, performance, data science, pratheesh shivaprasad by klotz

Think Correlation Isn’t Causation? Meet Partial Correlation

Despite its power, partial correlation remains underrated in data science. This tool addresses the main limitation of simple correlation by accounting for the influence of other variables.

2025-01-09 Tags: correlation, partial correlation, data science, causation by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: data science*

Linked Tags

Related Tags